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Abstract. The problems studied in this paper originate from Graph 
Motif, a problem introduced in 2006 in the context of biological net- 
works. Informally speaking, it consists in deciding if a multiset of colors 
occurs in a connected subgraph of a vertex-colored graph. Due to the 
high rate of noise in the biological data, more flexible definitions of the 
problem have been outlined. We present in this paper two inapproxima- 
bility results for two different optimization variants of Graph Motif. We 
also study another definition of the problem, when the connectivity con- 
straint is replaced by modularity. While the problem stays NP-complete, 
it allows algorithms in FPT for biologically relevant parameterizations. 

1 Introduction 

A recent field in bioinformatics focuses in biological networks, which repre- 
sent interactions between different elements {e.g. between amino acids, between 
molecules or between organisms) [1]. Such a network can be modeled by a vertex- 
colored graph, where nodes represent elements, edges represent interactions be- 
tween them and colors give functional informations on the graph nodes. Using 
biological networks allows a better characterization of species, by determining 
small recurring subnetworks, often called motifs. Such motifs can correspond 
to a set of nodes realizing a same function, which may have been evolutionary 
preserved [22]. It is thus crucial to determine these motifs to identify common 
elements between species and transfer the biological knowledge. 

Historically, motifs were defined by a set of nodes labels with a given topol- 
ogy {e.g. a path, a tree, a graph). The algorithmic problem was thus to find 
an occurrence of the motif in the network which respect both the label set and 
the given topology. This leads to problems roughly equivalent to subgraph iso- 
morphism, a computationally difficult problem. However, in metabolic networks, 
similar topology can represent very different functions [17]. Moreover, in protein- 
protein interactions (PPI) networks, informations about the topology of motifs 
is often missing [5]. There is also a high rate of false positive and false nega- 
tive in such networks [10]. Therefore, in some situations, topology is irrelevant, 
which leads to search for functional motifs instead of topological ones. In this 
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setting, we still ask for the conservation of the node labels, but we replace topol- 
ogy conservation by the weaker requirement that the subnetwork should form a 
connected subgraph of the target graph. This approach was proposed by Lacroix 
et al, defining Exact Graph Motif [17]. 



• Input: A graph G = {V,E), a set of colors C, a function col : V ^ C, a. 

multiset M over C, an integer k. 

• Output: A subset V C V such that (i) \V'\ = k, (ii) G[V'] is connected, 
and (iii) coliV) = M. 

In the following, the motif is said colorful if M is a set (it is a multiset 
otherwise). Note that this problem also has application in the context of mass 
spectrometry [4], and may be used in social or technical networks [3,23]. 

Not surprisingly, the problem remains NP-complete, even under strong re- 
strictions (when G is a bipartite graph with maximum degree 4 and M is built 
over two colors only [11], or when M is colorful and G is a rooted tree of depth 
2 [2] or a tree of maximum degree 3 [11]). However, for general trees and multiset 
motifs, the problem can be solved in 0(71^°+^) time, where c is the number of 
distinct colors in M, while being W[l]-hard for the parameter c [11]. We also 
point out that the problem can be solved in polynomial time if the number of 
colors in M is bounded and if G is of bounded treewidth [11]. It is also polyno- 
mial if G is a caterpillar [2] , or if the motif is colorful and G is a tree where the 
colors appears at most two times. This last result is mentioned in [9] and can be 
retrieved by an easy transformation to a 2-SAT instance (chapter 4 of [23]). 

The difficulty of this problem is counterbalanced by its fixed-parameter 
tractability when the parameter is k. the size of the solution [17,11,3,5,14,13] 
(for information about parameterized complexity, one can for example see [18]). 
The currently fastest algorithms in FPT for Exact Graph Motif run in 0*{2'') 
time for the colorful case, 0*(4'') time for the multiset case, and in both cases 
use polynomial space [13] (the O* notation suppresses polynomial factors). A 
recent paper shows that the problem is unlikely to admit polynomial kernels, 
even on restricted classes of trees [2] . 

To deal with the high rate of noise in biological data, different variants of 
Exact Graph Motif have been introduced. The approach of Dondi et al. 
requires a solution with a minimum number of connected components [8] , while 
the one of Betzler et al. asks for a 2-connected solution [3]. As for traditional 
bioinformatics problems, some colors can be inserted in a solution, or conversely, 
some colors of the motif can be deleted in a solution [5,8,13]. Recently, Dondi et 
al. introduced a variant when the number of substitutions between colors of the 
motif and colors in the solution must be minimum [9] . 

Following this direction, we consider in Section 2 an approximation issue 
when one wants to maximize the size of the solution. In Section 3, we propose 
an inapproximability result when one wants to minimize the number of sub- 
stitutions. Finally, we present in Section 4 a new requirement concerning the 
connectedness of the solution with one hardness result and two fixed-parameter 
tractable (FPT) algorithms. Due to space constraints, most proofs are in missing. 
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2 Mciximizing the solution size 

To deal with the high rate of noise in the biological data, one approach allows 
some colors of the motif to be deleted from the solution, leading to Max Graph 
Motif, a problem introduced by Dondi et al. [8] . 

• Input: A graph G = {V,E), a. set of colors C, a function col : V ^ C, a. 

multiset M over C. 

• Output: A subset V 'Z V such that (i) G[V'] is connected, and (ii) 
col{V') C M. 

• Measure: The size of V. 

In a natural decision form, wc arc also given an integer k in the input and 
one looks for a solution of size k (the number of deletions is thus equal to 
\M\ — k). The problem is known to be in the FPT class for parameter k [8,5,13]. 
Concerning its approximation, Max Graph Motif is APX-hard, even when G 
is a tree of maximum degree 3, the motif is colorful and each color appears at 
most two times in G (in the same conditions, recall that the Exact Graph 
Motif is polynomial [9]). Moreover, there is no constant approximation ratio 
unless P = NP, even when G is a tree and M is colorful [8]. 

In the following, we answer an open question of Dondi et al. [8] concerning 
the approximation issiic of the problem when G is a tree where each color occurs 
at most twice. To do so, we use a reduction from Max Independent Set, 
a problem stated as follows: Given a graph Gj = {Vi,Ej), find the maximum 
subset Vj C Vi where there is no two nodes u,v E V/ such that {u, v} E Ej. Our 
proof proceeds in four steps. We first describe the construction of the instance 
I' = (G, G, col) for Max Graph Motif from the instance I = (Gj) of Max 
Independent Set (we consider the motif as M = G). We next prove that we 
can construct in polynomial time a solution for I' from a solution for I and, 
conversely, that we can construct in polynomial time a solution for I from a 
solution for T' . Finally, we show that if there is an approximation algorithm 
with ratio r for Max Graph Motif, then there is an approximation algorithm 
with ratio r for Max Independent Set. 

Before stating the reduction, consider a total order over the edges of G. We 
then define a function adj : Vj — ?■ 2^' , giving for a node v GVi, the ordered list 
of edges where v is involved (thus of size d{v), the degree of v). With this order, 
consider that adj(w)[i] give the i-th edge where u is involved. From the graph 
Gi = {Vi, Ej), we build the graph G = {V, E) as follows (see also Figure 1): 

- y = {r} u {uf : 1 < i < | Vr|, e e adj(wi)} U 
{vi:l<i<\Vi\,l<j<\Vi\^], 

-i; = {{r,^;fj("')W} : 1 <i < |V,|} U 

||^adjK)b1^^adjK)[i+l]^ : 1 < z < \Vi\, 1 < J < d{Vi)} U 

[{vlv^'}:l<^<WJ\.l<J<\VJ\'}. 
Informally speaking, r is the root of G. There are \Vi\ paths connected to r. 
Each path represents a node of G/ and is of length d{vi) + |V/p. Observe that 
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a, = (v,,E,) 




Fig. 1: Construction of G from an instance Gi of Max Independent Set. For ease, 
only the color of the nodes of G (not the label) are given. From a solution in G/ in 
bold, the solution for Max Graph Motif is given in bold in G. 



\V\ = l + 2|£^/| + |V/|.|V/p (there are two nodes involved in each edge, therefore 
^veVi ^i"") ~ Let us now describe the set C of colors and the coloration 

function col : V C. The set of colors is C = {cr} U {cg : e € Ej} U {c^ : 1 < 
i ^ ^ J ^ l^/P}- Considering M = C, the motif is colorful. Coloration 

of the nodes of G is done as follows: col{r) = Cr, Ve = {vi,Vj} € Ej,col{v^) = 
col{vj) = Ce, VI < i < \Vi\, 1 < j < \Vi\'^,col{v^) = cP-. In other words, for each 
edge e = {vi,Vj}, a, copy of the node Vi and a copy of the node Vj have the same 
color Ce, the one of the edge. Moreover, r and the nodes vj all have different 
colors. In fact, nodes {vj : 1 < i < \Vi\,l < j < iVfP} can be considered as 
"black boxes" , which are given for free. We clearly observe that by construction, 
G is a tree where each color appears at most two times. Let us show how to 
build a solution for I' from a solution for I, and vice-versa. 

Lemma 1. If there is a solution Vj C Vj fori, then there is a solution V' CV 
fori' such that \V'\ > \Vl\.\Vi\'^. 

Proof (Sketch). We build V as follows: V = {r} U {vf,vl : Vi G V/,e e 
adj(t;i),l < j < IVfl^}. In other words, we add in V the root r of the tree 
and all the paths corresponding to the nodes of V/ (see also Figure 1). □ 

Lemma 2. // there is a solution V' QV fori' , then there is a solution Vj '^Vi 
fori such that |V/| > I" '^''j^jp^'"^ . 

Proof (Sketch). For each 1 < i < iV/j, we add Vi in Vj iff all the nodes w^, 1 < 
j < |^/|^ and vf,e& adj(i'i) are in V . In other words, we add Vi in V/ if the 
whole path corresponding to this node is mV . □ 

These two lemmas lead to the main result of this section. 
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Proposition 1. Unless P = NP, there is no approximation ratio lower than 
\V\3~'^ for Max Graph Motif, for any e > 0, even when the motif is colorful 
and G is a tree where each color of C appears at most two times. 

Proof. Suppose there is such a ratio r for Max Graph Motif. Then, there is 
an approximate solution V^px which, compared to the optimal solution V^p^, 

is of size \VXpx\ > 

With Lemma 1, \Vl,p^\ > \Vl^^J.\Vi\\ 
We supposed iV^pxl > ^^^i. 
Therefore, \VXpx \ > ■ 



With Lemma 2, |V/^^^ 



Which leads to, |V/ 



> 
> 



\Kpx\-^\Ei\-1 



mi^.\Vr\')/r)-2\Ej\- 



Since < 1, 



A-\Vi\^)/r 



1 = 



Thereby, if there is an approximation algorithm with ratio r for Max Graph 
Motif, there is an approximation algorithm with ratio r for Max Independent 
Set. We conclude the proof by observing that \V\ = 0{\Vi\^) and that unless 
P = NP, there is no ratio lower than |V/|^~'' for Max Independent Set, 
Ve > [24]. □ 



3 Minimizing the number of substitutions 

In this section, we focus on Mm Substitute Graph Motif, a problem recently 
introduced by Dondi et al. [9]. In this variant, some colors of the motif can be 
deleted, but the size of the solution must be equal to \M\. Therefore, the deleted 
colors must be substituted by the same number of colors. 

• Input: A graph G = {V,E), a set of colors C, a function col : V ^ C, a, 

multiset M over C. 

• Output: A subset V C V such that (i) \V'\ = \M\ and (ii) G\V'] is 
connected. 

• Measure: The number of substitutions to get M from col{V'). 

Dondi et al. [9] prove that MiN Substitute Graph Motif is NP-hard, 
even when G is a tree of maximum degree 4 where each color occurs at most 
twice and the motif is colorful. On the positive side, they prove that the problem 
is in the FPT class when the parameter is the size of the solution. 

Unfortunately, even in restrictive conditions (when G is a tree of depth 2 
and the motif is colorful), we prove that there is no approximation ratio within 
clog 1 I, for a constant c. To prove such inapproximability result, we do an 
L-reduction from MiN Set Cover [20], a problem stated as follows: Given a 
set X = {x\,X2, ■ ■ ■ , x\x\} and a collection <S = {^i, ^2, . . . , 'S'|s|} of subsets of 
X, find the minimum subset S' C S such that every element of X belongs to at 
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least one member of S' . We denote by e{i,j) the index I such that xi correspond 
to the j-th element of Si. We first describe the polynomial construction of X' = 
(G,C,col,M), instance of Min Substitute Graph Motif, from T = {X,S), 
any instance of Min Set Cover. From an instance I, let build G = {V, E) as 
follows (see also Figure 2): 

-V = {r]yj {v,:l<i< \S\] U 

Kj,t : 1 < « < 1 < j < \Si\, l<t<\S\ + 1}, 

-E = {{r,Vi} : 1 <i< U 

{{vuVij,t}:l<i< \Sll<3< |S,|,1<<< \S\ + l}. 




Fig. 2: Illustration of the construction of an instance of Mm Substitute Graph 
Motif from an instance of Min Set Cover such that X = {xi, X2, xs} and S = 
{{xi,X2},{x2,X3},{x2}}- For ease, only the color of each node of the graph (and not 
the label) is given. The associated motif is M = {cr} U {ck,t : 1 < fc < 3, 1 < t < 4}. A 
possible solution (with two substitutions) is given in bold. 



Informally speaking, r is the root of a tree with \S\ children, correspond- 
ing to each subset of S. Each child Vi,l < i < \S\, got (|iS| + chil- 
dren, corresponding to |5| + 1 copies of each element of Si. The set of colors 
is C = {cr] \J{ci : I <i < \S\} U {ck,t : 1 < A; < 1 < t < \S\ + 1}. The 
coloring function is such that the root has a unique color, i.e. col{r) = Cr. 
Each node Vi is colored with the unique color corresponding to the sub- 
set of S, col{vi) = Ci,Vl < i < \S\. Each node Vij^t get the color of the 
copy of the represented element, i.e. col{vij^t) = Ce(^i,j),t- Finally, the motif is 
M = {cr} U {ck^t : 1 < < 1^1,1 < i < |5| + 1}. Observe that the colors 
{ci : 1 < i < \S\} are not in the motif (which is colorful by construction). Let 
now show how to build a solution for I' from a solution for I, and vice-versa. 
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Lemma 3. // there is a solution S' for an instance X of MiN Set Cover, there 
is a solution for the instance I' of MiN Substitute Graph Motif with \S'\ 
substitutions. 

Lemma 4. From a solution for the instance X' for Mm Substitute Graph 

Motif with at most s substitutions, there is a solution for the instance X for 
MiN Set Cover of size at most s. 

Proof. Let y C y be a solution for X' such that wc can obtain M from col{V') 
with at most s substitutions. We can suppose that s < \S\ + 1, otherwise, S' = S 
is a solution of correct size. Solution for X is built as follows: <S' = {Si : Vi G V'}. 
If for some 1 < < \X\ there is no color of the set {ck,t : 1 < t < |iS| + 1} in the 
solution, it means that these \S\ + 1 colors have all been substituted, which is a 
contradiction with the supposed maximum number of s substitutions. Therefore, 
for each 1 < fc < there is at least one color from the set {ck.t : 1 < ^ < li^l+l} 
in the solution. Thus, since the solution is connected, all elements of X are 
covered by S'. Finally, the size of <S' is bounded by s. Indeed, since their colors 
are not in the motif, there are at most s nodes Vi inV. □ 

We can now state the main result of this section. 

Proposition 2. Unless P — NP, there is no polynomial approximation algo- 
rithm for MiN Substitute Graph Motif with a ratio lower than clog|V|, 
where c is a constant, even when the motif is colorful and G is a tree of depth 2. 

As a corollary of Proposition 2, we remark that the reduction is also an 
parameterized reduction. Since MiN Set Cover is W[2]-hard if parameterized 
by the number of subsets in the solution [18], MiN Substitute Graph Motif 
is also W[2]-hard if parameterized by the number of substitutions. 

Corollary 1. MiN Substitute Graph Motif is \N\l]-hard when parameter- 
ized by the number of substitutions. 

4 Using modularity 

In this section, wc introduce a variant of Exact Graph Motif, where the 
connectivity constraint is replaced by modularity. After a quick recall on the 
modules properties, we justify this new variant. The problem stays NP-hard, 
however, the tools offered by the modularity allow efficient algorithms. 

4.1 Definitions and properties 

In an undirected graph G = {y,E), a node x separates two nodes u and v iff 
{x, u} ^ E and {x, v] ^ E. K module of a graph G is a set of nodes not 
separated by any node of F \ A^. In other words, a module M is such that 
Vx ^ M,\/u,v G M,{x,u} & E ^ {x,v} & E [6]. The whole set of nodes V 
and any singleton set {u}, where u & V, are the trivial modules. Before stating 



8 



the definition of specific modules, let say that two modules A and B overlap 
if (i) AnB ^ 0, (ii) ^ \ B 7^ 0, and (iii) B\Ay^^. According to [6], if two 
modules A and B overlap, then An B, AU B and {AU B)\{An B) are also 
modules. This allows the definition of strong modules. A module is strong if no 
other module overlaps it, otherwise it is weak. Therefore, two strong modules are 
either included into the other, either of empty intersection. A module C S* is 
said maximal for a given set of nodes S (by default the set of nodes V) if there 
is no module M' s.t. M. C A4' C S. In other words, the only module which 
contains the maximal module M is S. 

There are three types of modules : (i) parallel^ when the subgraph induced 
by the nodes of the module is not connected (it is a parallel composition of 
its connected components), (ii) series, when the complement of the subgraph 
induced by the nodes of the module is not connected (it is a series composition 
of the connected components of its complement), or (iii) prime, when both the 
subgraph induced by the nodes of the module and its complement are connected. 

The inclusion order of the maximal strong modules defines the modular tree 
decomposition T{G) of G, which is enough to store the whole set of strong mod- 
ules. The tree 7~(G) can be rccmsively built by a top-down approach, where the 
algorithm recurs on the graph induced by the considered strong module. The 
root of this tree is the set of all nodes V while the leaves are the singleton sets 
{u}, Mu G V. Each node of T(G) got a label representing the type of the strong 
module, parallel, series or prime. Children of an internal node M are the maxi- 
mal submodules of M. {i.e. they are disjoints). The modular tree decomposition 
can be obtained with a linear time algorithm, [e.g. the one described in [15]). 
We can now introduce an essential property of T{G): 

Theorem 1. ([6]) A module of G is either a node ofT{G), either a union of 
children ( of depth 1) of a series or parallel node in T{G) . 

One can see strong modules as generators of the modules of G: the set of all 

modules of G can be obtained from the tree T{G). A crucial point to note is that 
there is potentially an exponential number of modules in a graph {e.g., the clique 
Kn has 2" modules), but the size of T{G) is 0{n) (more precisely, T{G) has 
less than 2n nodes since there are n leaves and no node with exactly one child). 
Therefore, the exponential-sized family of modules of G can be represented by 
the linear sized tree T{G). 

4.2 When modules join Graph Motif 

In the following, we investigate the algorithmic issues of other topology-free 
definition, when replacing the connectedness demand by modularity. Following 
definition of Exact Graph Motif, we introduce Module Graph Motif. 

• Input: A graph G = {V,E), a set of colors C, a function col : V ^ C, a 
multiset M on G of size k. 

• Output: A subset V C V such that (i) V is a module of G and (ii) 
col{V') = M. 
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This definition links the modularity demand with the motif research. The 
module definition implies that all the nodes in this module have a uniform rela- 
tion with the set of all the other nodes outside of the module. The module nodes 
are indistinguishable from the outside, they are acting similarly with the other 
nodes of the graph. 

Authors of [1] define a biological module as a set of elements having a sep- 
arable function from the rest of the graph. Similarly, authors of [19] describe a 
biological module as a set of some elements with an identifiable task, separable 
from the functions of the other biological modules. Moreover, it is shown in [7] 
that genes with a similar neighborhood have chances to be in a same biological 
process. It is thus possible that set of nodes in an algorithmic module of a graph 
representing a biological network have a common biological function. Also note 
that authors of [21] describe modules in gene regulatory networks as groups of 
genes which obey to the same regulations, and consequently, as groups which 
members cannot be distinguished from the rest of the network. 

Moreover, apart of using modules in a slightly different goal (in order to 
predict more cleverly results of PPI), Gagneur et al. [12] note that modules 
of a graph can join biological modules, and consider modular decomposition 
as a general tool for biological network analysis under different representations 
(oriented graphs, hyper-graphs...). 

However, there is no clear definition of what is (or should be) a biological 
module in a network [1]. We thus claim that the approach using modular de- 
composition is complementary to the previous definitions of biological modules 
[e.g. connected occurrences or compact occurrences). 

4.3 Difficulty of the problem 

Unfortunately, MODULE Graph Motif is NP-hard, even under strong restric- 
tions, i.e. when G is a collection of paths of size three, and when the motif 
is colorful. Observe that under the same conditions, Exact Graph Motif is 
trivially polynomial-time solvable. 

Proposition 3. Module Graph Motif is fiP-Complete even if G is a col- 
lection of paths of size 3 and M is colorful. 

Proof (Sketch). To prove the hardness of Module Graph Motif, we pro- 
pose a reduction from the NP-complete problem Exact Cover by 3-Sets 
(X3C) stated as follows: Given a set X = {xi,...,xsq} and a collection 
<S = {^i, . . . , S'|5|} of 3-elements subsets of X, does <S contains a subcoUection 
S' C S such that each element of X occurs in exactly one element of 5'? 

Let us now describe the construction of an instance I' = (G, G, col) of 
Module Graph Motif from an arbitrary instance X = {X,S) of X3C. The 
graph G = (V^E) is built as follows: V = {vf : 1 < i < \S\,Xj e S^}, 
E = {{vl,vf} U {vf.vf} : 1 < i < \S\}. Informally speaking, G is a collec- 
tion of \S\ paths with three nodes (recaU that for each 1 < i < \S\, \Si\ = 3). 
The set of colors is G = {cj : 1 < i < \X\}. The coloration of G is such that 
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col{vl) = Cj. In other words, each node get the color of the represented element 
of X. We also consider the colorful motif as M = C. □ 

4.4 Algorithms for the decision problem 

Even if the problem is hard under strong restrictions, the modular decomposition 
tree is a useful structure to design efficient algorithms. More; prciciscly, we show in 
the sequel that Module Graph Motif is in the FPT class when the parameter 
is the size of the solution, with a better complexity than for Exact Graph 
Motif when the motif is a multiset. As a corollary, we show that the problem 
can be solved in polynomial time if the number of colors is bounded. Moreover, 
Module Graph Motif is still in the FPT class if a set of colors is associated 
to each node of the graph. 

Let us first observe that asking for a strong module instead of any module in 
the definition of Module Graph Motif leads to a linear algorithm. Indeed, 
one can just browse T{G) and test if the set of colors for each strong module is 
equal to the motif. 

Let us now show an algorithm with a time complexity of 0*(2'^), where k 
is the size of the solution, for Module Graph Motif, even if the motif is a 
multiset. To the best of our knowledge, we do not know an algorithm with a 
time complexity lower than 0*{A'') for Exact Graph Motif when the motif 
is a multiset. 

Proposition 4. There is an algorithm for Module Graph Motif with a time 
complexity of 0{2''\V\'^) and a space complexity 0/0(2*^1^1), where k is the size 
of the motif and of the solution. 

Proof (Sketch). Since \M\ = k, observe that there are at most 2*^ different mul- 
tisets M' such that M' C M. We first build in polynomial time the modular tree 
decomposition T{G) from G. We repeat the following algorithm for each node 
M of r(G). 

We start by testing if the set of the colors of M is exactly equal to the motif 
M. If it is the case, the algorithm terminates. Otherwise, if is a scries or 
parallel node, a module can be a imion of its children. Given an arbitrary order 
on its t children, denote by Child(A[)[i] the z-th child of M. We then delete all 
children M' of M such that col{Ai') (t M, where col{M') is the set of colors of 
the nodes of Ai' . Indeed, such a child cannot be in a solution considering M. We 
note that the set of colors for each child correspond to a multiset M' C M . Since 
any union of children of Al is a module of G, it is thus a potential solution. We 
propose to test by dynamic programming if such union corresponds to a solution 
for M. We build a table D{i, M'), for < i < t and M' C M. Therefore, D has 
t + 1 lines and 2*^ columns. We fill this table as follows: 



£)(0, M') = True if M' = {0, . . . , 0}, False otherwise, 

D{i, M') = D{i - 1, M') V D{i -1,M'\ coZ(Child(Al)[i])) ifi<t,M' C M. 
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The algorithm returns True iff D{t, M) = True. Informahy speaking, the first 
part of the computation of M') ignores the i-th child of M while the second 
part add this child into the potential solution. □ 

Corollary 2. Module Graph Motif is in FPT when parameterized by 

(fc,|C|). 

Proof. Note that, by definition of the motif M, for each color c € C, occm{c) < k. 
Thus, the number of multisets M' such that M' C M is less than fcl^^L The time 
complexity of the algorithm in Proposition 4 is bounded by ©(fcl*^! 

This corollary is quite surprising and shows a fundamental difi'crcncc with 
Exact Graph Motif. Indeed, recall that this problem is NP-complete, even 
when the motif is built over two different colors [11]. 

Let us now show that even when a set of colors is associated to each node 
of the graph, the problem is still in the FPT class. It is indeed biologically 
relevant to consider many functions for a same reaction in a metabolic network 
or to consider more than one homology for a protein in a PPI network [17,3]. 
A version of Exact Graph Motif with a set of colors for each graph node as 
been defined, and thus, we can introduce the analogous problem List-Colored 
Module Graph Motif. 

• Input: A graph G = {V,E), an integer k, a set of colors C, a multiset M 
over C, a function col : V ^ 2*^ giving a set of colors for each node of V. 

• Output: A subset V CV such that (i) \V'\=k, (n) V is a module of G 
and (iii) there is a bijection f : V ^ M such that Vv g V, f{v) € col{v). 

Proposition 5. List-Colored Module Graph Motif is in the FPT class. 
4.5 Open problems 

Clearly, the noise in the biological data implies that searching exact occurrences 
of modules is too restrictive to consider a practical evaluation. Indeed, only one 
false positive or one false negative can suppress a potential solution. Adding 
flexibility as in variants for Exact Graph Motif seems essential. Deletions 
can be easily handled, but what about the insertions of colors or of nodes not 
in a module? It would also be interesting to know if Module Graph Motif 
is W[l]-hard if the parameter is the number of colors in the motif as for Exact 
Graph Motif, or if using modularity change the complexity class. 

The complexity of the algorithm of Proposition 5 is not satisfying for practical 
issues. We believe that this complexity can be improved by the use of multilinear 
monomials detection [16]. 
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Appendix 



Lemma 1. If there is a solution Vj C Vi fori, then there is a solution V 
fori' such that \V'\ > \Vl\.\Vi\'^. 

Proof Wc build V as follows: V = {r} U {vf,vj : v, e V/,e e adj(tv), 1 < j < 
In other words, we add in V the root r of the tree and all the paths 
corresponding to the nodes of V/ (see also Figure 1). 

Let us prove that V' is a solution for I' such that \V'\ > |V/|.|V7p. Since 
the root is in the solution, G[V'] is connected. Moreover, colors of V are all 
distinct, therefore the solution is colorful. Indeed, if there are u,v gV such that 
col{u) = col(v), then {u,v} G E, which is a contradiction since V/ is a solution 
for Max Independent Set. Finally, we bound the size of V by observing that 
for each v G V/, we add the set of nodes in the path corresponding to v, which 
is of size + |Vr|2 > |Vr|2. □ 



Lemma 2. // there is a solution V' V fori' , then there is a solution Vj GVi 

\V'\ -2\Ei\ -l~ 



fori such that \Vj \ > 



Proof. For each 1 < i < |Vf |, we add Vi in Vj iff all the nodes v^,! < j < |V/p 
and vf,eG adj(ui) are in V'. In other words, we add Vi in V/ if the whole path 
corresponding to this node is in V. 



\V'\-2\Ei\- 



If 



Let us prove that V/ is a solution for I such that |Vy| > 

there are Vi,Vj G V/ such that {vi,Vj} = e & E, then vf and Vj are in V. It is 
impossible since col{vf) = col{Vj) and since all the colors of V must be distinct 
to be a solution for Max Graph Motif. Consequently, V/ is an independent 

set. There are - — \vl\'^ — whole paths in V'. Indeed, by removing 2|i?G| + 1 
to the whole number of nodes in the solution, we bound the number of nodes of 
type vj (recall that |F| = 1 + 2|£;7| + IF/MV/I^). □ 



Lemma 3. If there is a solution S' for an instance I of Mm Set Cover, there 
is a solution for the instance I' of Min Substitute Graph Motif with \S'\ 
substitutions. 

Proof. Let 5' C 5 be a solution for I. Given a total order on the subsets of S, 
for each 1 < A: < denote by S'^m the subset such that (i) S'^j„ € S' , and 
(ii) S^^^ is the first subset of <S' where Xk is. Moreover, for each Si, denote by 
fi the smallest index j of Vij^t such that Si = S^}!^^ . 

The solution V is built as follows: V = {r} U {vi : Si G S'} U {vij^t : Si = 

^ , J = /i, 2 < t < |5| + 1} U {Vi^^^t : Si = ,j7^fi,l<t<\S\ + l}. Less 

formally, we put in the solution the root, the set of nodes representing subsets 
Si of S' , also with the |iS| + 1 copies of each node representing an Xk (the one in 
the subset with minimal index in the solution), except for the element Xk of X 
with the lower index in S^^^, where only |iS| copies are in the solution. 
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The graph G[V'] is connected since the nodes Vij^t are in the solution if and 
only if the node Vi is also in the solution. Moreover, a node Vi is in the solution 
if there is a fc such that Si = 5','5ii„- There is thus an integer fi for which only |iS| 
copies of Vij.^t are in the solution. Therefore, by construction, the color Ce(ij.)^i, 
which is in the motif, is substituted in the solution by q. On the whole, there 
are |<S'| substitutions, since the other colors of the motif are in the solution. □ 

Proposition 2. Unless P = NP, there is no polynomial approximation algo- 
rithm for MiN Substitute Graph Motif with a ratio lower than clog|F|, 
where c is a constant, even when the motif is colorful and G is a tree of depth 2. 

Proof. The proof comes directly from the Lemmas 3 and 4, and because there 
is no approximation algorithm with a ratio lower than clog|X| for MiN Set 
Cover, unless P = NP [20]. Observe that the parameter is strictly the same 
between the two instances I and I', therefore, it is an L-reduction. □ 

Proposition 3. Module Graph Motif is HP-Complete even if G is a col- 
lection of paths of size 3 and M is colorful. 

Proof. Module Graph Motif is in NP since given a set V" C y, one can check 
in polynomial-time if V' is a module and if the colors of C appears exactly once 
if V' . To prove its hardness, we propose a reduction from Exact Cover by 
3-Sets (X3C). This special case of Set Cover is known to be NP-complete. 
Recall that X3C is stated as follows: Given a set X = a;2, . • . , 2:35} and 
a collection S = {5*1, . . . , 5*151} of 3-clcmcnts subsets of X. docs S contains a 
subcollection S' C S such that each element of X occurs in exactly one element 
of S'. Size of X must be a multiple of three since a solution is a set of triplets 
where each elcm(;nt of X must appears exactly once. 

Let us now describe the construction of an instance 1' = (G, C, col) of MOD- 
ULE Graph Motif from an arbitrary instance X = {X, S) of X3C (see also Fig- 
ure 3). The graph G = {V, E) is buih as follows: V = {vl ■.l<i< G Si}, 
E = {{vl,v^} U {v^,vf} : 1 < i < Informally speaking, G is a collection of 
|<S| paths with three nodes (recall that for each l<i<|iS|,|5i|=3). 




Fig. 3: The graph G built from X — {xi, 2:2, . . . , aje} (thus with q — 2) and 
S = {{xi,X3, X5}, {xi,X2, X4}, {x2,X4, xe}, {x2,X5, xe}} (only the colors of the node are 
written). By construction, the set of colors asked in any solution is C7 = {ci, C2, . . . , ce}. 
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The set of colors is C = {ci : I < i < \X\}. The coloration of G is such that 
col{vf) = Cj. In other words, each node get the color of the represented element 
of X. Wc also consider the colorful motif as M = C. 

Let us now prove that if there is a solution for an instance I of X3C, then 
there is solution for the instance I' of Module Graph Motif. Given a solution 
S' CS for I, a solution V for I' is built as follows: V = {vf : (E S' , e Si}. 
Informally speaking, the solution contains the set of paths corresponding to the 
chosen triplets in the solution for X3G. The set V' is a module, and by definition 
of a solution for T, each color of {c^ : 1 < z < \X\} appears exactly once in V'. 

Conversely, let us now prove that there is a solution for the instance I of 
X3C if there is a solution for the instance I' of Module Graph Motif. First 
observe that since g > 1, then |X| > 3 and therefore C > 3. A module of size 
greater or equal than three in a collection of paths of size three must be a union 
of paths of size three. Indeed, suppose by contradiction that there is a module 
Ai of size greater than three which is not a union of paths of size three. There is 
thus a node u & M such that at least one of its neighbor v G N{u) is not in Ai 
and V separates u from another node of Ai. Therefore, Ai is not a module. The 
solution is built as follows: S' = {Si : vf € V'}. Since the solution V' is a union 
of paths of size three, each triplet Si is either completely chosen in the solution 
S', either absent. Moreover, since V is a solution, colors of V appears exactly 
once. Therefore, each element of X appears exactly once in <S'. □ 



Fig. 4: A sample graph in a) and the corresponding modular tree decomposition in b). 
Nodes of the tree are either series (s), parallel (p), prime {prime) or leaves. 



Proposition 4. There is an algorithm for Module Graph Motif with a time 
complexity of 0{2'^\V\^) and a space complexity of 0{2''\V\), where k is the size 
of the motif and of the solution. 

Proof. Since \M\ = k, observe that there are at most 2*^ different multisets 
M' such that AI' C M. Wc first build in polynomial time the modular tree 
decomposition T(G) from G. We repeat the following algorithm for each node 



prime 





a) 



b) Vl V2 V3 Vi V6 V6 Vy Vs VloVll 



Ai of T{G). 
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We start by testing if the set of the colors of M is exactly equal to the motif 
M. If it is the case, the algorithm terminates. Otherwise, if is a series or 
parallel node, a module can be a union of its children. Given an arbitrary order 
on its t children, denote by Child(A^)[i] the i-th child of Ai. We then delete all 
children A4' of A4 such that col{M') <^ M, where col{M') is the set of colors 
of the nodes of A^'. Indeed, such a child cannot be in a solution considering M. 
We note that the set of colors for each child correspond to a multiset M' C M. 

Since any union of children of is a module of G, it is thus a potential 
solution. Wc propose to test by dynamic programming if such union corresponds 
to a solution for M. We build a table D{i,M'), for < i < t and M' C M. 
Therefore, D has t + 1 lines and 2'' columns. We fill this table as follows: 

D{0, M') = True if M' = {0, . . . , 0}, False otherwise, 

D{i, M') = D{i - 1, M') V D{i -1,M'\ col{ChM{M)\i])) i{i<t,M' C M. 

The algorithm returns True iff D{t, M) = True. Informally speaking, the first 
part of the computation of D{i, M') ignores the i-th child of M. while the second 
part add this child into the potential solution. 

The time and space complexities of the dynamic programming are 0(2'^ 
since D is of size at most 2*^1^1 and the computation time for each element is 
constant. Therefore, since the dynamic programming is launched in the worst 
case on each node of T{G), the whole time complexity is 0(2'^|yp). 

It remains to show the correctness of the dynamic programming. Suppose 
the existence of a module M.' such that col{M') = M. Then, either Ai' is 
a strong module represented in a node of T{G), or it is a union of j mod- 
ules M[,M'2, ■ ■ ■ ,Mj, children of a module M. Therefore, M \ {{col{M'i) U 
{col{M'2)} U • • • U {col{M'j)}}} = {0, 0, . . . , 0}, then D{t, M) = True. 

Conversely, if there is a module A4 such that D{t, M) = True, then there is 
a union of the children of A4 such that the set of colors of these children is equal 

to M. a 

Proposition 5. List-Colored Module Graph Motif is in the FPT class. 

Proof. Wc first build the modular tree decomposition T{G) from G. We repeat 
the following algorithm for each node A4 of T{G). 

If M. has less than k nodes, we look for a bijection between the colors of M. 
and M. To do so, we try all the possible combinations. In the worst case, there 
are c*^ such combinations, where c is the number of different colors in M (thus 
c < k). 

In the following, we thus can consider M. with more than k nodes. If it is 
a prime node, we can ignore it since this node cannot be a solution for a motif 
of size k. Otherwise, it is a scries or parallel node, and a union of the children 
can be a solution. Let us now show that the number of possible solutions is 
exponential only with k, and it is thus possible to try all the possibilities. 

To do so, we first give a bound to the number of children for M. There arc at 
most k nodes in each child oi M. (otherwise, this child cannot be in a solution). 
In each child of Ai, there are at most 2'^ different sets of colors associated to 
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each node. Since there are at most k nodes in each child of A^, there are at most 
(2*^)* different children of A4. A same child of A4 cannot occurs more than k 
times (otherwise, the next occurrences cannot be in a solution for a motif of size 
k). Therefore, there arc at most k{2'^)^ children to consider for M. 

We bounded the number of children for M.. We now choose the potential 
union of children of A1 in the solution - we must choose i among the k{2'^)^ 
children, where i goes from 1 to k. This is bounded by (fc(2'^)'°)'^+^. Finally, for 
each union of chosen children, there are c possible colors for the nodes (there 
are at most k of them), which lead to at most tests. The overall complexity 
of the algorithm is thus exponential only in k. □ 



